Target Word Selection Using WordNet and Data-Driven Models in Machine Translation
نویسندگان
چکیده
Collocation information plays an important role in target word selection of machine translation. However, a collocation dictionary fulfills only a limited portion of selection operation because of data sparseness. To resolve the sparseness problem, we proposed a new methodology that selects target words after determining an appropriate collocation class by using a inter-word semantic similarity. We estimate the similarity by computing semantic distance of two synsets in WordNet and term-to-term similarity in data-driven models. In WordNet, semantic similarity between two word can be calculated by adapting a reciprocal of the Semantic Distance (SD). For the calculation of the SD, each synset in WordNet is assigned an M-value. The M-value is computed as follows: M -value = radix sfp , where radix is an initialM-value, sf is a scale factor, and p is the number of edges from the root to the synset. As the data-driven models, we utilize Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis(PLSA), a probabilistic application of LSA. LSA applies singular value decomposition (SVD) to the matrix. SVD is a form of factor analysis and is defined as A = UΣV T ,where Σ is a diagonal matrix composed of nonzero eigen values of AA or AA, and U and V are the orthogonal eigenvectors associated with the r nonzero eigenvalues of AA and AA, respectively. The term-to-term similarity is based on the inner products between two row vectors of A, AA = UΣ2UT . And To compute the similarity of w1 and w2 in PLSA, P (z|w1)P (z|w2) should be approximately computed with being derived from P (z|w) = P (z)P (w|z) ∑
منابع مشابه
Automatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملExtending Bilingual WordNet via Hierarchical Word Translation Classification
We introduce a method for learning to assign word senses to translation pairs. In our approach, this sense assignment or disambiguation problem is transformed into one on how to navigate through a sense network like WordNet aimed at distinguishing the more adequate senses from others. The method involves automatically constructing classification models for branching nodes in the network, and au...
متن کاملExample-Based Machine Translation for Low-Resource Language Using Chunk-String Templates
Example-Based Machine Translation (EBMT) for low resource language, like Bengali, has low-coverage issues, due to the lack of parallel corpus. In this paper, we propose an EBMT for low resource language, using chunk-string templates (CSTs) and translating unknown words. CSTs consist of a chunk in source-language, a string in target-language, and word alignment information. CSTs are prepared aut...
متن کاملExample Based English-Bengali Machine Translation Using WordNet
In this paper we propose an architecture of EnglishBengali Example Based Machine Translation (EBMT) using WordNet. The proposed EBMT system has five steps: 1) Tagging 2) Parsing 3) Prepare the chunks of the sentence using sub-sentential EBMT 4) Using an efficient adapting scheme, match the sentence rule 5) Translate from Source Language (English) to Target Language (Bengali) in the chunk and ge...
متن کاملAutomatic Target Word Disambiguation Using Syntactic Relationships
Multiple target translations are due to several meanings of source words, and various target word equivalents depending on the context of the source word. Thus, an automated approach is presented for resolving target-word selection, based on “word-to-sense” and “sense-to-word” source-translation relationships, using syntactic relationships (subject-verb, verb-object, adjectivenoun). Translation...
متن کامل